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ABSTRACT 

Wojtak et al have stacked 7,800 clusters from the SDSS survey in redshift space. They 
find a small net blue-shift for the cluster galaxies relative to the brightest cluster 
galaxies, which agrees quite well with the gravitational redshift predicted from GR. 
Zhao et al. have pointed out that, in addition to the gravitational redshift, one would 
expect to see transverse Doppler (TD) redshifts, so (Sz) = — ($) + (f3 2 )/2 with (3 the 
3D source velocity in units of c, and that these two effects are generally of the same 
order. Here we show that there are other corrections that are also of the same order 
of magnitude. The fact that we observe galaxies on our past light cone results in a 
bias such that more of the galaxies observed are moving away from us in the frame 
of the cluster than are moving towards us. This causes the observed average redshift 
to be (Sz) = -($) + (f3 2 )/2 + (J}%), with f} x is the line of sight velocity. That is if we 
average over galaxies with equal weight. If the galaxies in each cluster are weighted 
by their fluence, or equivalent ly if we do not resolve the moving sources, and make an 
average of the mean redshift giving equal weight per photon, the observed redshift is 
(Sz) = — ($) — (/3 2 }/2, so the kinematical effect is then opposite to the usual transverse 
Doppler effect. In the WHH experiment, the weighting is a step-function because of 
the flux-limit for inclusion in the spectroscopic sample and the result is different again, 
and depends on the details of the luminosity function and the SEDs of the galaxies. 
Including these effects substantially modifies the blue-shift profile. We identify some 
potential biases in the dynamical analysis of stacked clusters. We show that in-fall and 
out-flow have very small effect over the relevant range of impact parameters. 
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1 INTRODUCTION 

Wojtak, Hansen and Hjorth (2011, hereafter WHH) have 
measured the gravitational redshift effect in clusters of 
galaxies. They stacked 7,800 massive clusters selected from 
the GMBCG cluster sample (Hao, J., et al, 2010) derived 
from from the SDSS DR7 survey data (Abazajian et al., 
2009) in redshift space, using coordinates of the brightest 
cluster galaxy (BCG) as the origin. They fit the cluster- 
frame redshift distributions, determined at a range of impact 
parameters, to a linear ramp to describe the foreground and 
background galaxies plus a quasi-Gaussian distribution to 
describe the cluster, and find that the centres of the cluster 
components have a small net blue-shift 5z ~ — lOkm/s/c; a 
remarkable achievement since the galaxy clusters have ve- 
locity dispersions of order 600 km/s. 

A blue-shift would be expected in GR since the light 
from the BCGs, which are thought to reside close to the 
centres of clusters, will have climbed out from deeper in the 
cluster potential well than the light from the majority of 
the galaxies, and the amplitude of the effect appears to be 
broadly consistent with their estimates of the gravitational 



redshift obtained using a mean cluster mass distribution de- 
termined from the observed velocity dispersion. 

WHH suggested that the result is in conflict with the 
predictions of TeVeS modified gravity theory (Bekenstein, 
2004). However, agreement with the potential inferred from 
the kinematics of non-relativistic particles like galaxies is 
expected in any metric theory of gravity since both gravita- 
tional redshifts and particle motions are determined by the 
time component of the metric; what this type of measure- 
ment tests is the validity of the equivalence principle (Will, 
2006; Bekenstein and Sanders, 2012; Zhao et al, 2012, here- 
after ZPL). This type of observation can, however, provide 
constraints on theories in which there are long-range non- 
gravitational interactions between dark matter that aug- 
ments gravity on cluster scales (e.g. Gradwohl & Frieman, 
1992; Gubser & Peebles, 2004; Farrar & Rosen, 2007). 

WHH compare the frequency shift with the estimate 
for {&} r± /c 2 where the averaging is along a line of sight 
with impact parameter rj_. This would be appropriate if 
the light were emitted by non-inertial observers on a rigid, 
non-rotating, lattice in a state of rest with respect to the 
cluster. It is also valid, to a good approximation, for ob- 
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servations of the redshift of X-ray lines from heavy ions 
in the intra-cluster medium (Broadhurst & Scannapieco, 
2000). But, as emphasised by ZPL, this is not correct when 
the light emanates from galaxies that are in free-fall. One 
way to obtain the observed redshift in this situation is to 
use local Lorentz boosts to give the Doppler shift between 
each emitting galaxy and its neighbouring lattice-based ob- 
server living in the rest-frame of the cluster (though there 
are other constructions one could use — see Bunn & Hogg 
(2009) for an in-depth discussion). If we set up coordinates 
such that the distant observer lives at positive x, the en- 
ergy of a photon in the emitting galaxy's frame relative 
to the cluster rest-frame is Eg = 7(1 — /3 X ) Erf where 
7 = (1 — /? 2 ) -1 ^ 2 and j3 = v/c with f} x the component to- 
wards the observer, so, up to second order in j3 the redshift 
is 1 + z = E G /E RF = 1 - fi x + /3 2 /2. Adding the gravita- 
tional redshift difference yields the average redshift, given a 
phase-space density (PSD) for the galaxies p(r,/3,t), of 

J d 3 r [ d 3 p P (r, (3, t)(-p x + p 2 /2 - <&/c 2 ) 
[ ' f<Prf<Pf}p(r,(i,t) ■ (L> 

Note that the redshifts here are all relative redshifts between 
observers and emitters in the vicinity of the cluster, not the 
redshift actually observed; i.e. 1 + z = (1 + Z ohs )/(l + Zcl)- 
If the cluster is virialised, the PSD will be an even func- 
tion of velocity so the mean of the line-of-sight velocity /3 X 
will vanish, and one would conclude that the mean redshift 
difference is 

(Sz) = (^G-2BCG> = (/? G -/3icG>/2-<$G-$BCG}/c 2 (2) 

where now, following ZPL, allowance is made for the fact 
that the BCG will, in general, not be at rest at the centre 
of the cluster. Thus there is a positive contribution to the 
redshift, the transverse Doppler (TD) effect, that is opposite 
in sign to the gravitational redshift (GR) effect for rest- 
frame emitters (it being assumed here that the BCGs are 
on considerably lower energy orbits than the general cluster 
population) and, as emphasised by ZPL this effect will, quite 
generally, be of the same order as the gravitational redshift 
for a bound system by virtue of the virial theorem. 

The point of this paper is to show that there are other 
corrections of the same order of magnitude. One arises from 
the fact that we observe the galaxies on our past light cone 
and this causes a bias such that we see more galaxies moving 
away from us than moving towards us. We show in §2 that 
this gives an additional redshift {ft 2 }. 

But that is only true if each source galaxy is weighted 
equally in the averaging. If we apply any weighting based 
on galaxy luminosity then we also need to allow for the spe- 
cial relativistic beaming effect. In §3 we show that if wc 
do not resolve the internal motions, but make an average 
that gives equal weight per observed photon, the resulting 
redshift is just the opposite of the transverse Doppler ef- 
fect. That beaming and time-dilation would have an effect 
on gravitational redshift measurements using X-ray obser- 
vations was noted by Broadhurst & Scannapieco (2000), but 
in that application it is a much smaller effect so was ignored. 

In the WHH experiment the weighting was a step- 
function imposed by the flux-limit for inclusion in the spec- 
troscopic sample. We calculate the effect of this in §4. This 
turns out to be the dominant kinematic effect. 

In §5 we first attempt to clarify come issues concerning 



dynamical analysis of a composite cluster formed by stacking 
a heterogeneous collection of clusters. We then apply these 
results together with the observed velocity dispersion profile 
to generate predictions for the net effect and compare with 
the observations. 

In §6 we consider the effect of infall and outflow, which 
we find to have very little impact on the measurements, and 
in an appendix we develop the formalism for deriving, from 
numerical or analytical models, the predicted distribution of 
observed redshifts in order to facilitate a more direct com- 
parison with the current and future observations. 



2 PHASE-SPACE DENSITY ON THE PAST 
LIGHT-CONE 

One might imagine that allowing for the light travel time 
would be simply a matter of replacing p(r, (3, t) in equation 
(1) by p(r, /3, t = x/c), in which case there would be no effect 
in the virialised region since for a stable, relaxed, system the 
PSD is independent of time. We are choosing the origin of 
time here to be the time the light we observe left the center 
of the cluster. 

But this is not correct. While the PSD is invariant under 
Lorentz boosts and also along the trajectories of the parti- 
cles, it has a non-trivial transformation from rest-frame to 
light-cone (LC) coordinates: 

p L c(r,/3) = (l-/3 x )p RF (r,/3). (3) 

This means that the PSD for a virialised system viewed on 
the light cone is not an even function of velocity but has 
a small asymmetry which results in a non-vanishing of the 
mean of the line of sight component of the velocity. 

One way to see how this arises is to consider taking 
a photograph of a swarm of particles where, in any region 
of space, there are as many particles moving towards us as 
away from us. The particles that we will see in a small cu- 
bical cell in space are not the same as the particles that 
occupy the cell at the moment the light passes through the 
centre of the cell. As the past light cone of the event of our 
opening the shutter sweeps towards us through the cell it 
will overtake more particles that are moving away from us 
than are moving towards us. The result is that more par- 
ticles in the photograph will have positive radial velocities 
than negative ones. More quantitatively, if we have a pair of 
particles with the same x-component of velocity ft x with sep- 
aration in the rest frame of dx then on our past light-cone 
they have a separation g(xlc = dxnF/{l — ftx) and so the 
density (ordinary space density or phase-space density) on 
the light cone gains a factor l — ft x - Note that this is a purely 
Newtonian plus light-travel-time effect, and has nothing to 
do with Lorentz-Fitzgerald length contraction which causes 
the density of particles to depend on the state of motion of 
the observer. It is the same effect that causes a runner on a 
trail to meet more hikers coming towards her than going in 
the same direction. 

This result can easily be verified in the case of a toy 
model of a particle oscillating back and forth in a ID 
parabolic potential well <3>(x) = lj 2 x 2 /2. The trajectory 
is x(t) — acos(cjt + <f>), where <f> is the phase. The ve- 
locity in the rest-frame of the potential trough is ft — 
— (auj/c) sin(oit + <f>) which, on the light cone t = x/c is ft = 
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— (aw/c) sin(u)x / c+ 4>) ~ — (au/c) sin <f> — (ow/c) 2 cos 2 <j>. The 
average of first term (over phase, or equivalently over time) 
vanishes, but the second term is always negative and is just 
— /3 2 , and the average agrees with (P) = J dx J d/3p(x, /3)(1 — 
P)P/ JdxJ d/3p(x,f3)(l -j3) = -(p 2 ) with the rest frame 
PSD an even function of velocity. 

For this toy model, and for particles uniformly dis- 
tributed in phase, the PSD is zero except on a circle in 
space (where x' = xui/c is the dimensionless displace- 
ment), and p(x',P,t) vanishes except on a cylinder around 
the t-axis. When we slice this cylinder on the light cone, 
the particles also live on a circle, but their density is non- 
uniform. 

A parabolic 1-D potential is not very realistic, but the 
result is quite general. For particles orbiting in any static 
potential well, the average of the instantaneous line of sight 
velocity, either over time for one particle or over phase for 
a distribution, will average to zero, but the observed veloc- 
ity will contain an extra term which is the light propagation 
time x/c times the acceleration of the particle, and the accel- 
eration and position are anti-correlated in a gravitationally 
bound system, so this does not average to zero. Since the 
acceleration is the gradient of the potential, it is guaranteed 
that the average of the line-of-sight velocity will be of the 
same order as <E>/c. 

WHH measured the mean redshift difference for galax- 
ies at a range of projected distances r± from the cluster 
center. The appropriate thing to compare with a PSD from 
a dynamical model or output of a numerical simulation is, 
with suitable normalisation, 

{Sz) r± = J dx j d 3 Pp RF (r,/3,t = x/c)(l- p 

x (-p x +p 2 /2-$/c 2 ). 
In the virialised region, this gives 

(z-zbcg) = (P 2 -Place) 

+ (P 2 - 0BOG>/2 - (*G - *BCG>/c 2 . 

For isotropic orbits, the new term is 2/3 of the size of the TD 
effect and is of the same sign. Note that the asymmetry in 
the PSD applies to BCG as well; BCG line of sight velocities 
will also be biased to be positive with respect to the cluster 
centre of mass. 



3 UNRESOLVED SOURCES 

The foregoing analysis assumes that the redshift offsets are 
determined from a catalog of angular positions and redshifts, 
thus effectively giving equal weight per galaxy. 

But when we cannot resolve the sources, such as when 
we try to allow for the kinematics of stars in BCGs, or, 
potentially, for low resolution HI observations of clusters, 
we are averaging with equal weight per observed photon, 
and this changes the effect. 

Consider a source that emits photons of fixed energy 
E — Eq isotropically in its rest-frame in a burst as it 
passes the origin of space moving at velocity P along the 
a:-axis. Boosting the photon 4-momenta into the the 'lab- 
oratory frame' (denoted below by primed coordinates) one 
finds that a distant observer measures an energy E(p') — 
£0/7(1 — Pp) = Eq~((1 + Pjj) where p is the cosine of the 



(4) 



angle between the x-axis and the photon direction. Compar- 
ing the 3-momenta yields p = (/?— p)/(l+Pp), and therefore 
the Jacobian of the transformation from observed to source- 
frame solid angles is dp' /dp = 1/7 2 (1 + Pp) 2 = (Eo/E) 2 , 
and since n(p')dp' = nodp, the density of photons per unit 
solid angle is n(p') = n (E/E ) 2 = n /7 2 (l - Pp') 2 . This 
is the familiar relativistic beaming effect. The energy is a 
function of lab-frame direction, and one finds that the prob- 
ability distribution for energy is P(E) oc n(p')dp' /dE(p') 
which is flat from E = E /f(l + P) to E = £0/7(1 - P), 
and zero otherwise. 

This is the probability distribution for random direction 
to the observer, or, equivalently, the probability distribution 
for a single observer viewing radiation from sources at the 
origin moving in random directions. It is also the same dis- 
tribution one would find for a particle oscillating back and 
forth in a box, or for the emission from particles in a region 
of space if they are moving in randomly oriented directions 
though all at the same speed. The mean photon energy is 
readily found to be (E) — J dE EP(E) — 7.E0; a result that 
could have been anticipated since whatever rest-mass Smo 
the source used to create the radiation has energy in the 
lab-frame fSrrioc 2 . 

For a distribution of velocities we need to allow for the 
time-dilation effect: if the sources are identical and all emit- 
ting photons at a fixed rate in the frame, the interval be- 
tween emission events in the observer-frame will be longer by 
a factor 7, so the number of photons observed per unit time 
from sources with gamma factor 7 is dh(X) oc P(-y)d~y/j and 
the average energy per photon is then 



W = — FT- — = — r j r>t w =£0(7 ) 



J dh 



/d 7 P(7)/7 



(6) 



We see here that the received energy per unit time is just 
the sum of the power of the sources, but the number of 
received photons per unit time in the observer frame has a 
I/7 dependence. 

For unresolved sources then, the effect of the internal 
kinematics is to introduce a blue-shift. How does this square 
with the result obtained in the previous section where we 
found that the effect for resolved sources was a red-shift 
that, once we allowed for light cone effects, was actually 
larger than the transverse Doppler effect? To see that the 
two calculations are consistent, and obtain a useful check of 
the validity of the light-cone effect, we now show that the 
resolved-source analysis reproduces the result for unresolved 
sources, as of course it should, if we introduce a weight per 
galaxy proportional to its fluence (number of photons per 
unit time per unit area at the detector). It is sufficient to 
consider identical isotropic emitters, for which the fluence 
is dn/dt ~ En(E) ~ E 3 where the extra factor of energy 
flux as compare to photon flux comes again from the trans- 
formation from intervals of time at the source and at the 
observer. 

The number of photons per second received from such 
a source is proportional to 1/7 3 (1 — Pp') 3 and therefore the 
fluence weighted mean observed photon energy should be 
given by 



(E) = E 



J dp p 2 p (P)Jdp(l-pph- 4 (l 



3pY 



(7) 



/ dp p 2 P0 (p) J dp(i - pph- 3 (i - Pp)- 3 

where the first factor of 1 — Pp is the asymmetry of the ob 



4 Nick Kaiser 



served phase-space distribution from the light-cone effect, 
and where we have assumed that the distribution of veloci- 
ties is isotropic and have dropped the prime. The integrals 
are elementary, and we readily find (E) = £0/(1/7), fully 
consistent with the result obtained above. Without the light- 
cone 1 — /3fi term we would have obtained a different result. 

For non-relativistic systems with f3 2 <C 1 the effect of 
the internal motions of emitters within unresolved objects 
is to give a change of energy 5E/E ~ {/3 2 }/2 which is a 
blue-shift and just opposite to the usual tranverse Doppler 
redshift. 



4 SURFACE BRIGHTNESS MODULATION 

In §2 we implicitly assumed that all galaxies are observed 
and catalogued. But in reality the galaxies in the spectro- 
scopic sample were selected according to their apparent lu- 
minosities. As discussed in §3, a galaxy's apparent lumi- 
nosity depends on its state of motion through the beaming 
effect which changes the surface brightness and hence lumi- 
nosity. This results in a bias for the redshifts of the cluster 
galaxies which are selected for redshift measurement as the 
surface brightness modulation means that galaxies moving 
away from us in a given region of space have a higher limit 
on their intrinsic luminosities than galaxies moving towards 
us. For the WHH experiment the effect is quite strong, and 
strongly increasing with cluster redshift, since the flux limit 
(r = 17.8) is only about one magnitude fainter than M* 
even at the minimum redshift limit, so the flux selection 
limit falls on the steep end of the luminosity function (LF) 
where small fractional changes in luminosity have a large 
effect on the number of sources detected. 

We then consider the effect on the BCGs. For these the 
flux limit is irrelevant, but there is a bias because velocities 
can change the ranking of the two brightest galaxies. This 
turns out to be almost independent of cluster redshift. 

4.1 Flux-Limited Galaxies 

For low velocities /3 <C 1 the fractional change of the fluence 
is An/n = 3/3 x , but to obtain the observed photon count, 
we also need to allow for the change in frequency and the 
limits imposed by the broad-band filter. Combining these, 
the fractional change in apparent luminosity for a galaxy 
in a cluster at redshift Z is Al/l = (3 + ct(Z))fi x , where 
the effective spectral index, for a photon counting detector 
with response curve R(X), is a(Z) = —d\n(J dXXR(\/(l + 
Z))f\)/dln(l + Z). This depends on the details of the galaxy 
SED, but is found, using SEDs for E, SO and Sb galaxies 
from Coleman, Wu and Wcedman (1980) to be close to 2 for 
galaxies at the relevant redshifts (see figure 1), reflecting the 
fact that in the r-band, galaxies tend to have flat fx curves. 

The modulation of the number density of detectable 
objects is given by the product of Al/l and the logarith- 
mic derivative S(Z) = — dlnn(> Ln m (Z))/d\nL which, as 
mentioned, is a strongly increasing function of redshift. Ide- 
ally we would calculate this using a luminosity function ap- 
propriate for the actual cluster galaxies used in the study, 
and this will, in general, depend on the projected distance 
from the cluster center. However, Hansen el at. (2009) have 
shown that while the mix of red vs. blue galaxies changes 
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Figure 1. Spectral index vs. redshift for representative galaxy 
types observed in Sloan r-band 



strongly with radius, the overall luminosity function does 
not vary much, and the parameters are not very different 
from the field galaxy luminosity function, so we will use 
the latter, as determined for SDSS by Montero-Dorta & 
Prada (2009), as a proxy. Their estimate of the LF obtained 
from the r-band magnitudes K-corrected to Z = 0.1 has 
M» — 51og 10 h = —20.7 and faint end slope of a = —1.26. 
The resulting dlnn(> L)/dlnL, computed using the flux 
limit r = 17.77 appropriate for the SDSS spectroscopic sam- 
ple used by WHH, is shown as the dot-dash curve in figure 
2. 

Finally, we need the average of —(3 + a)d\nn(> 
L) /d In L over the redshift distribution for the galaxies actu- 
ally used in the experiment. The 7,800 clusters used by WHH 
were selected by applying a richness limit to the parent GM- 
BCG catalog (Hao, J., et al. 2010) that contains 55,000 clus- 
ters extending to Z — 0.55. These clusters were derived from 
the SDSS photometric catalog that is much deeper than the 
spectroscopic catalog. Consequently, at the redshifts where 
the spectroscopically selected galaxies live, this parent cata- 
log is essentially volume limited for the clusters used, so the 
redshift distribution for the cluster members used is essen- 
tially the same as that for the redshift distribution for the en- 
tire spectroscopic sample, save for the fact that the GMBCG 
catalog has a lower redshift limit Z\ irn — 0.1, which is very 
close to the redshift where dN/dZ — Z 2 n(Z) peaks. This is 
the bell shaped curve in figure 2. Combining these we find 
(dlnn/drnL) = / dZ Z 2 n{Z)d In n/d In Lj J dZ Z 2 n(Z) ~ 
—2.0 with integration range 0.1 < Z < 0.4, and the aver- 
age -((3 + a(Z))dlnn(> L)/d\nL) ~ 10. This may be a 
slight overestimate, as the cluster catalogue is not precisely 
volume limited and the actual dN/dZ may lie a little below 
the solid curve in figure 2 at the highest redshifts. 

For the WHH experiment, the surface brightness mod- 
ulation effect is considerably larger in amplitude than the 
transverse Doppler and light-cone effects, but has opposite 
sign. For isotropic orbits the combination of the TD, LC and 
SB effects is 

(8z) = (2.5 - ((3 + a{Z))8{Z))){pl) ~ -7.5(/3 2 x ). (8) 
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Figure 2. The dot-dash curve is the logarithmic derivative of 
the comoving density of objects above the luminosity limit as a 
function of redshift. The bell-shaped curve is dN/dZ = Z 2 n(Z) 
and the solid curve is that truncated at the minimum redshift 
imposed by the parent cluster catalogue. The mean of the log- 
derivative, averaged over the redshift distribution turns out to be 
~ 2.0. 



spect to m a b and performing the integrals yields 

P(/?i) = P (/3i) - 2K/3iP (m oi , = 0,/3i) (10) 

which depends on the joint distribution of the intrinsic mag- 
nitude difference and the velocity of one or other of the two 
brightest galaxies. The mean redshift offset is then 

(Szbcg) = ~Ka 2 ab P (m ab = 0)/c 2 (11) 

where a ab = ((f3 a — P b ) 2 \m a b = 0) is the variance of the rel- 
ative velocity of the two brightest galaxies given that they 
have similar magnitudes. This is something that is straight- 
forward to measure from the data. It is reasonable to ex- 
pect that this is larger than (twice) the velocity variance 
for brightest cluster galaxies. Smith et al. (2010) have mea- 
sured the distribution for magnitude differences and find 
Po{m ab = 0) ~ 0.35 so we then have 

{Szbcg} 0.32(3 + a(Z))a'i b /c 2 

~ 1.9km/s/c(a af) /600km/s) 2 . 

Note that the surface brightness boosting effect on BCGs 
does not have the strong redshift dependence that is ex- 
pected for the flux-selected galaxies. 



4.2 Effect on BCGs 

The TD and LC effects act on all galaxies, including the 
BCGs, in the same way. The SB effect is different; in the 
WHH analysis, only clusters with at least 5 measured red- 
shifts were used, so it is safe to assume that the brightest 
cluster galaxies will be unaffected by the flux limit. How- 
ever, for some small fraction of the clusters, the two bright- 
est galaxies will have magnitudes that are sufficiently close 
that the effect of surface brightness modulation by the mo- 
tions will be enough to change their ranking, resulting in a 
bias. In principle, this effect could be eliminated by only us- 
ing clusters where the difference between the two brightest 
galaxies is sufficiently large that the velocities cannot change 
the ranking. 

To analyse this, let the joint distribution of difference 
of intrinsic magnitudes m a b — "ma — m b , and line-of-sight ve- 
locities p a , p b for pairs of top two ranked cluster galaxies, in 
no particular order, be Po(m a b, Pa, Pb)- This is a symmetric 
function of m a b- 

The velocities change the observed surface bright- 
nesses of the galaxies, and hence the difference of of ob- 
served magnitudes is m' ab = m ab — K(p a — /3&), where 
k = (ln(10)/2.5)(3 + a(Z)), so the observed distribution is 
P(m a b, Pa, Pb) = Po(m a b - n(Pa - Pb), Pa,Pb), the Jacobian 
of the transformation from intrinsic to observed magnitude 
being unity. 

The probability distribution for the velocity of the first 
ranked galaxy /3i is then 

/0 />oc 
dmab / dp b P(mab,Pi,Pb) 
-OC J CO /q\ 

/*oo f-OO V / 

+ / drriab / dp a P(m a b,Pa,Pi) 

JO J-oc 

i.e. the sum of the distribution function for p a if m a b < 
and the DF for p b if m a b > 0. 

Making a Taylor expansion of P(m a b, Pa, Pb) with re- 



5 PREDICTING THE REDSHIFT PROFILE 

We can now make a prediction for the combined 
GR+TD+LC-f-SB effect as a function of radius using the 
observed velocity dispersion data provided by WHH. We 
start with a discussion of dynamical analysis of a compos- 
ite, or stacked, cluster and how this differs from the analysis 
of single cluster. We then review the relevant properties of 
the BCGs; both their cluster-centric kinematics and their 
halo properties. We then attempt to combine all of the ef- 
fects discussed above to predict the expected profile of the 
redshift offset. 

5.1 The Gravitational Redshift 

A composite cluster differs from a single cluster in several 
ways. Individual clusters are very much still in the process 
of forming and so tend to be unrelaxed and to have strong 
substructure. They can also undergo very strong fluctua- 
tions in density and potential as they go through the pro- 
cess of merging. In the composite, any such fluctuations and 
substructure will be averaged out, and the form of the com- 
posite can, in the limit of large number of contribution clus- 
ters, be assumed to be nearly spherically symmetric — more 
precisely the composite will have symmetry about the line- 
of-sight axis but may have some elongation along the line 
of sight — and can evolve only on a cosmological timescale. 
These are obviously beneficial differences; the usual assump- 
tions of stability and symmetry which apply only poorly to 
individual clusters should be well obeyed for a composite. 

In an individual single cluster, the gravitational poten- 
tial is a function of position and time. In the composite con- 
structed by WHH, the individual clusters span more than an 
order of magnitude in range of mass, so galaxies at the same 
position in the composite will be subject to very different 
gravitational accelerations, and there will also be fluctuating 
forces that can, for example, stochastically heat a particle 
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from a low energy orbit to a higher energy one, as happens as 
a result of violent relaxation during merging (Lynden-Bell, 
1967). Consequently, unlike an individual cluster where the 
galaxies behave like a fluid in phase-space, the galaxies in 
the composite behave like a gas where initially neighbour- 
ing particles will later be found to be widely separated. The 
galaxies in a composite therefore do not obey the Vlasov 
(or collisionless Boltzmann) equation, which is usually the 
starting point for derivation of the equations of continuity 
for matter and momentum that allow one to relate density, 
potential and velocity dispersions. 

Nonetheless, a composite has a velocity dispersion ten- 
sor and a gravitational potential, both functions of posi- 
tion, and both of whose projections are observable. Each is 
some kind of average over the cluster population. But, given 
the large range in cluster masses, and the issue of how the 
centres of the clusters were determined in order to do the 
stacking, the question of how exactly these are related is 
non-trivial. 

WHW modeled both of these observables as being the 
averages one would obtain for spherical clusters with a power 
law distribution in virial mass, truncated at the same limits 
as used in selecting the clusters, and having NFW model 
(Navarro et al, 1997) profiles. The amplitude and index 
of the mass function and the NFW concentration param- 
eter were treated as free parameters determined by match- 
ing the predicted velocity dispersion to that observed. The 
mass-anisotropy degeneracy issue was addressed by perform- 
ing the modeling for two values of the anisotropy parameter 
/3 — 1 — (Tg(r)/(Jr(r) = 0,0.4 which span the range suggested 
by simulations and observations. The effect of using BCGs 
as centres was incorporated in the model by boosting the 
line-of-sight model velocity dispersions by adding the BCG 
velocity dispersion in quadrature: a 2 (r±) — > a 2 (r±) + cr| CG 
with ctbcg assumed to be 35% of the total velocity disper- 
sion. 

Here we show that there is a more direct way to relate 
the velocity dispersion and the potential that does not re- 
quire any modelling of the mass function of the clusters. We 
find that the velocity moments determined from the com- 
posite obey Euler's equation and directly provide, without 
any correction for motions of the BCGs, the average grav- 
ity, relative to the BCG, as a function of distance from the 
BCG. This is almost exactly what is needed to predict the 
gravitational redshift of galaxies relative to the BCG, which 
is what is measured. Advantages of this approach are sim- 
plicity, and that it allows a non-parametric reconstruction of 
the gravity which, given the great precision of the velocity 
dispersion data, is eminently practical and obviates the need 
for assumptions about the profiles of the clusters. If clusters 
were individually spherical this analysis would be exact. The 
only problem is that Euler's equation provides the average 
gravity weighted by galaxy number, whereas the gravita- 
tional redshift is the average potential weighted by galaxy 
number. The gradient of the latter is not precisely equal to 
the former; so the relationship suffers an asphericity bias. 
This bias can be estimated from simulations or, in princi- 
ple, from the projected shapes of the actual clusters used in 
the measurement. Here we use a simple analytic model with 
realistic quadrupole moments for the density to show that 
the effect of this kind of low-order cluster asphericity is in 
fact very small. Displacement of the BCG from the centre 



of mass of the cluster also introduces asphericity. This pro- 
duces a bias that falls off rapidly with distance, but may be 
important at small impact parameter. 

Let us assume as our fundamental model for galaxies 
that for a cluster, with mass, size, shape, orientation etc. de- 
noted by label C, there is a phase-space distribution function 
Pcg{v, v) that is in general dependent on the type and lumi- 
nosity of the galaxy denoted by an abstract index G; that the 
observed galaxies are a Poisson sample of this density field, 
and that pcg{y, v) obeys the collisionless Boltzmann equa- 
tion. The idea here is that things like dynamical friction and 
response to the cluster environment happen slowly, at least 
in an ensemble average sense, and that the instantaneous 
relation between the phase-space coordinates is the same as 
for massless test particles. The PSDF is interpreted here as 
a probability density (e.g. Binney & Tremaine, 2008). 

Consider a composite cluster constructed by making a 
big realisation of clusters drawn from a probability distri- 
bution function P(C) for the cluster attributes and with 
galaxies generated from the distribution function to make 
a big synthetic redshift survey; selecting clusters by some 
suitably defined cluster identification algorithm and then 
stacking them in redshift space relative to suitably defined 
centres. These might be defined, as in WHH, as the loca- 
tions of the most luminous galaxies that are likely to lie 
near the bottom of the cluster potential well. Alternatives 
would be to define the origin as some kind of centroid of 
the redshift-space coordinates of the galaxies in each clus- 
ter, or one might use the X-ray centroid to give the angular 
position on the sky. 

At any position r relative to the centre of some cluster 
C, the zeroth, first and second moments of the velocity over 
the PSDF obey the time dependent Euler equation: 

d t (n(vi)) + dj(n{viVj)) + ndi® = (13) 

where, as usual, n — J d 3 v p(r, v), (vi) = n _1 J d 3 v p(r, v)v» 
etc., and all are understood to be dependent on the particu- 
lar choice of cluster and tracer type G and 3> is determined 
by the cluster only. This is valid in coordinates that are rela- 
tive to the instantaneous position and velocity of the centre, 
and $c may also be defined to be relative to the potential 
at the spatial origin. 

The Euler equation is an expression of the conservation 
of momentum. This may also be obtained directly from the 
stress, or momentum flux, tensor. For a Newtonian gravi- 
tating system composed of a large-number of particles the 
stress has both a kinetic component and a contribution from 
the gravitational field: 

Tij = ^2m G ncG(viVj)cG + -^Q^Si ~ ^j9 2 / 2 ) ( 14 ) 

(Maxwell, 1875; Misner, Thorne & Wheeler, 1973), G 
here being generalised to include the dark matter parti- 
cles and Qi — — 9i$c- The momentum density is Xf = 
~}2 G mcricc {vi)co and conservation of momentum: Tj 1 = 
is 

^ tug [dt (ncG {vi)cG) + dj{ncG {viVj)cG) + ncGdi®c] = 0. 

G 

(15) 

In general, the quantity in parentheses [. . .] need not van- 
ish for every component, only the mass- weighed sum has to 
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vanish. For example, in a thermal, ionised plasma, the ki- 
netic stress for both components is the same, the velocity 
dispersions scaling inversely with mass, while the gravity 
acts primarily on the ions. Here the electromagnetic inter- 
action allows the transfer of momentum between the two 
components. For collisionless particles, which we are assum- 
ing galaxies mimic, there is no way for different species to 
exchange momentum and the Euler equation is obeyed sep- 
arately for each of the independent components. 

The zeroth, first and second velocity moments for the 
composite at position r are 



(16) 



is diT, 



(fc) 



n G = J dCP(C)nca 
no{vi)G = J dCP(C)ncG(vi)cG 

na{viVj)a = J dCP(C)ncc(viVj)cG 

where P(C) is the probability density for clusters as a func- 
tion of size, shape, orientation etc. We then have 



d t (n G {vi)G) + dj(n G {viVj)G) = 



but 



where 



dCPiQldtincG^ca) + djinoaiviVjjoa)] 

is, by equation (|13|l just —nccdi&c so 
d t {n G (vi) G ) + dj(n G (viVj)G) = n G g Gi 



9a 



JdCP(C)ncGdi$c 
J dCP(C)n C G 



(17) 



(18) 



(19) 



which is the galaxy number weighted mean gravity relative 
to the BCGs. 

The moments determined from the composite therefore 
satisfy Euler's equations for particles moving in a potential 
whose gradient is (minus) the galaxy weighted mean gravity. 
In equilibrium, the rate of change of the momentum density 
must vanish, the pressure and gravitational stress tensors 
adjusting themselves so that the divergence of the total mo- 
mentum flow vanishes. That the kinetic pressure gradient 
should accurately balance the gravitational force density is 
essentially guaranteed regardless of how the centres are cho- 
sen. For example, one might select gas rich galaxies as cen- 
tres, and these will have a tendency to be falling into the 
cluster for the first time. There will therefore be a net mo- 
mentum density for the general cluster galaxies relative to 
these centres: n{vi) 7^ 0. But as all ensemble average prop- 
erties of the cluster population can evolve only on a cos- 
mological time-scale, the rate of change of the momentum 
density is dt{n(vi)) ~ Hn{vi) which is smaller than either 
the kinetic or gravitational stress divergence by the ratio of 
the dynamical time to the Hubble time. Thus, even for this 
rather extreme choice of 'centres', at the virial radius one 
would expect at most a small ~ 10% force imbalance. 

If we ignore the line-of-sight elongation from cluster 
selection effects, the composite density, potential etc. will 
all be spherically symmetric and the mean gravity will 
be radially directed. For a spherical system, the kinetic 
stress tensor for the Gth component (divided by m G ) is 
T^ fc ' = n(((jj? — a\)xiXj + a^Sij), where oy and a± are the 
radial and tangential velocity dispersions, and its divergence 



Xj(d r {nal) + 2n(cr 2 — <7±)/r) so 
d r (na^.) + 2n(o> — a\)/r — rig = 



(20) 

the familiar form for Euler's equation for a spherical equili- 
brated system. 

If we assume some velocity dispersion anisotropy, we 
can, in principle, de-project the projected density and veloc- 
ity dispersion to obtain n(r) and ay from which the Euler 
equation provides the radially directed gravity vector g(r). 

The gravitational redshift is the projection of the galaxy 
weighted average 3-D potential, relative to the cluster cen- 
tre, 



- JdCP{C)n C G$c 
JdCP(C)n C G 



(21) 



which is obviously very closely related to the gravity vector 
furnished by the Euler equation (|19[) . 

This is all very nice, but the catch is that for real clus- 
ters with asymmetry and substructure the potential that one 
obtains by integrating the galaxy number weighted average 
gravity will not be precisely equal to the galaxy weighted av- 
erage potential. Therefore predicting the gravitational red- 
shift from the observed velocity dispersion is more compli- 
cated than for an individual relaxed spherical cluster. The 
problem here is that the observable properties, being aver- 
ages of the gravity or potential weighted by galaxy number, 
are biased by asphericity of the clusters. Part of this as- 
phericity comes from the fact that BCGs do not in general 
lie at the minimum of the potential, but the biases will be 
present even in the limit of very cold reference galaxies. 

The above analysis has focused on recovering the po- 
tential from the velocity dispersions alone. This was the ap- 
proach taken by WHH, in fitting the velocity dispersion pro- 
file to a parameterised model of stacked NFW models. Kine- 
matic data have the advantage, modulo velocity anisotropy 
issues, of providing an unbiased view of the mass, but rely- 
ing on the kinematic data alone, and imposing constraints 
on the form of the model, is potentially dangerous. If BCGs 
do not lie at the centres of clusters, and indeed there is con- 
siderable evidence that they have substantial displacements, 
then a NFW model with its p ~ 1/r central cusp would not 
be a good model for the number density profile around the 
BCGs, which will have a core. 

The NFW model predicts that the velocity dispersion 
falls at very small impact parameter where we see low-energy 
galaxies trapped deep in the central conical potential well. 
Unfortunately the data presented in WHH supplementary 
figure 2 do not have sufficient resolution to reveal whether 
this is or is not obeyed. The data show the velocity disper- 
sion to be very flat for r± < 0.5Mpc and do not allow one 
to discriminate between a core caused by finite energies of 
BCGs and a cusp. If there is a core, then the predicted GR 
effect will be reduced. We will attempt to estimate this be- 
low. The NFW model is also not appropriate at large radius. 
The density in the model falls off as ~ 1/r 3 whereas we know 
from measurements of the cluster-galaxy cross-correlation 
function that, to the extent that galaxies are fair tracers of 
the mass, the real profile is more like 1/r 2 . A more reliable 
approach would be to use the kinematics and the projected 
density profile to determine a mass-per-galaxy M at around 
the virial radius, preferably correcting for the asphericity bi- 
ases, and then use the inverse Laplacian of the galaxy num- 
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ber density to predict the gravitational potential. This could 
be done using parameterised models, but if so, these should 
be allowed to have realistic profiles. 

Poisson's equation being linear, the Laplacian of the 
ensemble (spatial) average of the potential at position r rel- 
ative to the centre is 



V 2 $(r) = A-kGMg 



JdCP(C)n CG (r) 
JdCP(C) 



AirGMana{r) 



(22) 

which is spherically symmetric. Inverting this gives the 
spatial average potential, relative to the centre, $(r) = 
47rGMGV _2 nG(r). But using this to predict the gravita- 
tional redshift also gives an asphericity biased result since 
the quantity one measures is 



$ = AtyGMg 



J dCP(C)nc G V- 2 n C G(r) 
f dCP{C)n CG 



(23) 



and again the inverse Laplacian operator does not commute 
with the averaging if the clusters are non-spherical. 

These biases are something of a nuisance. However, 
there is some reason to think that they are quite small for 
realistic clusters. Kasun and Evrard (2005) have studied the 
shapes of clusters in numerical simulations. They compute 
second moments of the distribution of particles within r 2 m) 
and find that for massive clusters (virial mass greater than 
3 x 1O 14 M0) the modal minor/major axis ratio — the square 
root of the ratio of the smallest to largest eigenvalues of the 
2nd moment matrix — is about 0.64, and they find that 
smaller clusters are rounder. 

A simple model that would reproduce this is a softened 
tri-axial isothermal sphere potential $ = ]n(djXiXj + r 2 ). 
At three times the core radius this has density with second 
moments that reproduce the numerical results if we take 
G'ij — diag{1.2, 1, 1/1.2}. However, the biases are then very 
small; less than one percent for the radial component of the 
gravity and about 3 percent for the potential. This analysis 
only considers the lowest order quadrupole shape anisotropy, 
and does not address the effect of the displacement of the 
BCG from the potential minimum. The latter, however, pro- 
duces a dipole anisotropy for the density, radial gravity, po- 
tential etc., but these decrease with increasing radius, so 
the bias is a strongly decreasing function of radius. While 
the rms displacements of the BCGs are, of course, related 
to their cluster-centric velocity dispersion, the bias, cannot 
be compensated for by simply adding the BCG velocities in 
quadrature to the model velocities as that has effectively the 
same effect at all radii. 

Clearly what is needed is an estimate of the biases from 
numerical simulations, though it is also possible to attack 
this observationally using the shapes of real clusters seen in 
projection. If the results of this simple analytical model are 
supported, then the conclusion is that the composite cluster 
can be analysed almost exactly as though one were dealing 
with a single spherical equilibrated cluster, and that one can 
reconstruct the potential in the virialised region from Euler's 
equation — using velocity dispersions exactly as observed, 
and without any correction for the BCG velocity dispersion 
— and then armed with the mass-per-galaxy factor the po- 
tential can be determined at larger and smaller scales using 
the de-projected galaxy distribution. 



5.2 BCG Properties 

5.2.1 Intra- cluster Kinematics 

The analysis of WHH relies on the assumption that, in an 
average sense, the BCGs used as the origin of coordinates 
in velocity and angle space are a relatively cold population, 
velocity-wise, compared to the other galaxies and are there- 
fore orbiting close to the potential minimum. There are good 
theoretical grounds for believing that the BCGs will indeed 
be colder than the general population, but understanding 
in detail just how cold they are is important here for two 
reasons: first because the kinematically sourced effects de- 
pend on the velocity dispersions of the BCGs and second 
because it can inform us to what extent the mean density 
profile around the BCG is in fact likely to depart from the 
idealised NFW model predictions. 

WHH assume (Jbcg = 0.35(j obs , citing Skibba et al. 
(2011), in which case the effect on estimates of e.g. the TD 
and LC effects is quite small. But this may be a bit low. 
Skibba et al. found that the velocity dispersion for the cen- 
tral galaxies in the clusters were a con ~ 0.5<tcl which is a lot 
larger, but not directly measuring the same thing since they 
also found that about 30% of the time, the central galaxy 
was not in fact the brightest galaxy in the cluster. 

Coziol et al. (2009) have measured the distribution of 
BCG motions directly and find that (|ubcg|)/ccl — 0.40 ± 
0.04 for clusters of Abell richness class R=l. These clusters 
have mean dispersion a = 651km/s, a little higher than for 
the composite cluster here. The mean dispersion for R=0 is 
a = 539km/s, for which they find (|ubcg|)/o"cl — 0.43±0.03 
so the appropriate value for the sample here is around 0.42. 

For a Gaussian distribution, (|«bcg|) = \/2/7ro-BCG so 
this suggests obcg = 0.53o"cl which is again considerably 
larger than the value adopted by WHH and consistent with 
what Skibba et al. found for the central cluster galaxies. 

If we define a = 0"bcg/°cl then we have (T GL — 
ffobs/(l + «) and °icG = W(l + a))^obs- For a = 0.25, 
as suggested by the observations, and a obs = 610km/s we 
have u bcg = 270km/s and (jcl = 545km/s. With this value 
the differential TD and LC effects are reduced to about 60% 
of what one would expect in the limit that the BCGs lie at 
rest at the minimum of the cluster potential. The SB effect, 
as we show below, is somewhat less affected. 

We can also estimate the reduction in the gravitational 
redshift, assuming that the clusters in which the BCGs 
live do indeed have NFW profiles. Vanishing of the sec- 
ond time derivative of the moment of inertia I = ^ r 2 
tells us that (|r| 2 ) = 3oi CG = (r|V*|). In the inner 
parts of the NFW profile, the potential increases linearly 
with radius, so consequently we have (r|V$|) = (<&) so 
the predicted gravitational blue-shift for the hot popula- 
tion relative to the colder BCG population is decreased by 
Sz = 3al CG /c 2 ~ 0.9(a B CG/300km/s) 2 km/s/c. But their 
motions will also give them TD and LC red-shifts that are 
8z ~ 2.5<r BCG /c 2 which largely counteracts the change in 
the GR effect, and from §4.2 the SB effect on the BCGs 
gives them a blue shift which we have estimated to be about 
c8z = — 1.9km/s. 

Finally, one should allow for the fact that the light we 
see from the galaxies will have suffered gravitational redshift 
escaping the halos of the galaxies, and that the starlight will 
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also be affected by stellar motions as described above in §3. 
This is most important for the BCGs. 



5.2.2 BCG Halo and Internal Kinematics 

Regarding the GR effect, the stellar velocity dispersion in 
BCGs is typically a* ~250 km/s (e.g. Bernardi et al, 
2007); much larger than that of the run-of-the-mill galax- 
ies, and quite comparable to the motion of the BCGs in 
the cluster halo. The BCGs are unresolved, so we can use 
the result of §3 to predict the kinematically sourced blue- 
shift Sz ~ — (3/2) (cr*/c) 2 ~ — 0.3km/s/c, which is quite 
small. If they have fiat rotation curve halos, for which 
$ = <J>olnr- then, for isotropic orbits, $o = ^ a DM while 
vanishing of I for the stars requires $o = 3a 2 . The gravi- 
tational redshift is therefore Sz = 3((7 2 /c 2 )(ln(rh a io/?"*)) = 
0.63km/s/c(cr»/250km/s) 2 (ln(r ha i o /r*)). The problem here 
is determining the logarithm since we need to know the 
size of the BCG halo (as distinct from the cluster halo). 
This could be determined by galaxy-galaxy lensing, and also 
could in principle be determined from simulations. 

A rough estimate can be obtained from tidal considera- 
tions: The BCG halo density is phaio ~ 3<r 2 /47rGr{; alo while 
the density of the cluster is pcl ~ 2oc L /47rGr 2 where now r 
is the typical cluster-centric distance to the BCG. If this is a 
few hundred kpc then the tidal constraint that phaio > Pcl 
says that rh a i can't be bigger than about 1/3 of this. The 
scale lengths for BCGs are typically lOkpc, so this would 
suggest that the logarithm is approximately 2.5. If the halos 
are really this large, the effect of the motion of the stars 
Sz ~ — 3a 2 /2c 2 is a small correction, and we have a net 
redshift 8z ~ 1.25km/s/c. 

5.3 Revised Prediction for Redshift Profile 

We will proceed in two steps. We bootstrap off the estimate 
of the difference in potential between the BCG and the in- 
nermost point using the WHH stacked NFW model method. 
The innermost data point lies at r± — 0.6Mpc where the as- 
sumptions of virial equilibrium are likely to be well obeyed. 
We then extrapolate to larger impact parameters assuming 
galaxies trace the mass and using the cluster-galaxy cross- 
correlation function to get the appropriate ensemble average 
mass profile. 

The stacked NFW model appears to provide a good 
fit to the data within 1.2Mpc (WHH supplementary figure 
2) and yields a potential for galaxies at impact parameter 
0.6Mpc corresponding to Szgk — — 5.0km/s/c. This was ob- 
tained after correcting the velocity dispersions for the mo- 
tion of the BCGs, which we have argued above is inappro- 
priate, so we should increase this accordingly by about 13%. 
The finite BCG velocities will reduce the gravitational po- 
tential difference by about 0.9km/s/c but the BGH halo po- 
tential increases it by an estimated 1.6km/s/c. The net re- 
sult is apotential difference of (5zgr(0.6Mpc) ~ — 6.4km/s/c 

We now need include the kinematic effects. The ob- 
served velocity dispersion at this impact parameter is <r b s — 
610km/s, so with ctbcg = 0.5ctcl the LC and TD effects 
are <5z T d+lc ^ (3/5)2.5cr 2 /c 2 = +1.9km/s/c. The SB effect 
for the non-BCGs is Sz S b ^ -10.0crg L /c 2 ~ -9.9km/s/c 
and the SB effect on the BCGs we have estimated to be 



about +1.9km/s/c, and finally the kinematic blue-shift for 
the stars in the BCG gives +0.3km/s/c for a net kinematic 
effect 5ztd+lc+sb(0.6Mpc) ~ — 5.8km/s/c for a grand to- 
tal <5zgr+td+lc+sb(0.6Mpc) ~ — 12.2km/s/c. whereas the 
observed value is Sz ~ — 2.6km/s/c. The uncertainty on this 
point is approximately 6km/s/c so this would appear to be 
discrepant, but only at about the 1.5-sigma level. 

The NFW model predicts Sz ~ — lOkm/s/c for the 
outer measurements r ~ 3.3, 5.3Mpc, and the measurements 
straddle this value. While this model may provide a reason- 
able description for isolated clusters in the virialised domain, 
it is not at all clear that it is appropriate to describe the com- 
posite cluster being studied here. Tavio et al. (2008) have 
claimed that beyond the virial radius the density in numer- 
ical LCDM simulations actually falls off like p ~ 1/r rather 
than the p ~ 1/r 3 asymptote for the NFW profile, and the 
extended peculiar in-fall velocities found by Cecccarelli et 
al. (2011) also argue for shallow cluster profiles, but it is not 
clear that these results are widely accepted. 

An alternative, and possibly more reliable, approach is 
to assume that galaxies trace the mass reasonably well, in 
which case the density profile of the stacked cluster has the 
same shape as the cluster-galaxy cross correlation function 
(e.g. Croft et al, 1997). This has a power-law dependence 
p rv, r ' with 7 ~ 2.2, i.e. intermediate between the NFW 
and Tavio et al. model predictions. 

For space density p(r) = po(r / Vo) -7 , where ro is an ar- 
bitrary fiducial radius, the potential is $(r) = &o(r /ro) 2 ~ J 
and the 1-D velocity disperson, for isotropic orbits, is 
a 2 (r) = a 2 (r/r ) 2 - 7 with $ = 2((1 - y)/(2 - T )X 

The projected velocity dispersion measured is related to 
the 3-D velocity dispersion by a 2 (r ±) / a 2 (r) — f dyy 2 ~~'(l + 
y 2 ) -7 / 2 p ro j ec ted potential is related to the 3-D 

potential in the same way, so the projected quantities are 
related by $(rx) = — 2|1 — j\a 2 (r±). This is the poten- 
tial relative to infinity. The difference in projected potential 
between two projected radii ri and r-2 is ^(r-i) — $(ri) ~ 
2.4<T 2 (n)(l - (n/r 2 ) ' 2 ) for 7 = 2.2. The resulting GR ef- 
fect is shown as the dashed line in figure [3] and is actually 
quite similar to the shape of the profile for the WHH NFW 
composite model. 

The FWHM of the bell-shaped velocity distributions in 
WHH figure 1 appear to decrease by about 15% between 
the inner-bin and the outer points. This is reasonably con- 
sistent with the expected a 2 oc r~ 0,2 trend predicted if 
galaxies trace mass, but this is perhaps fortuitous since the 
outer points are well outside the virial radius. Regardless 
of whether the galaxies at large radius are equilibrated or 
not, we can use the change in the observed velocity disper- 
sion with radius to obtain the differential TD+LC+SB effect 
which is shown, added to the GR effect, as the solid line in 
figure O The kinematic effects flatten out the predicted pro- 
file, so the prediction is quite different from the gravitational 
redshift alone. 

The situation is clearly rather complicated, especially 
when using BGCs as the origin of coordinates since the ef- 
fects depend on things like the relative velocities of the top 
ranked pair of cluster galaxies, and on the BCG halo prop- 
erties, that are quite poorly known. However, those factors 
only influence the prediction for the innermost data point. 
The empirically based theoretical prediction for the profile 
of the redshift offset for the hot population as a function 
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Figure 3. Data points from figure 2 of WHH and prediction 
based on mass-traces-light cluster halo profile and measured ve- 
locity dispersions as described in the main text. The dashed line 
is the gravitational redshift prediction, which is similar to the 
WHH model prediction. The solid curve includes, in addition, 
the kinematically sourced effects that are the main focus of this 
paper. 

of impact parameter at r± > 0.6Mpc is the most robust; 
if galaxies are reasonable tracers of the mass then profile 
should be very flat, quite unlike the GR effect from a NFW 
profile. The predicted GR and total effects are shown in fig- 
ure 3. However, this analysis ignores the effect of secular 
infall and out-flow which we consider next. 



6 EFFECT OF INFALL AND OUTFLOW 

The discussion so far has focused mostly on the stable, 
virialised regions. Clusters, however, are evolving structures 
and the mass within any fixed physical radius M(r) will in 
general be changing. Outside of the virial radius (generally 
considered, inspired by the spherical collapse model, to be 
the radius within which the mean enclosed mass density is 
37r /Gt 2 ) we expect to see net infall, and the enclosed mass at 
those radii will be increasing with time, while at still larger 
radii there will be outflow tending asymptotically toward the 
Hubble flow. In the spherical collapse model the transition 
from inflow to outflow takes place at the turnaround radius 
where the mean enclosed mass density is p t — 3ir/32Gt 2 . 
This is for a matter dominated Universe; allowing for a cos- 
mological constant makes only a small change (Lokas & Hoff- 
man, 2001). 

For the empirically motivated p = po(?*/fo) _7 
model the mean enclosed mass is p(r) = 3(7 — 
l)(2nG)~ 1 a 2 rQ~ 2 r -1 and the nominal virial radius is r v i r = 
((7 - l)a 2 r ( 7 ~V /2tt 2 ) 1/7 ~ 1.8Mpc using 7 = 2.2, r = 
IMpc, ao = 545km/s and t ~ \/H = l/(70km/s/Mpc and 
turnaround is at r t ~ 8.7Mpc. 

In the centres of clusters there may be softening of the 
cores which would reduce the enclosed mass and would have 
an associated outflow. 

In any single cluster, the density may be changing 
rapidly — on the local dynamical timescale — especially 
during mergers and as clumps rain in, but for a compos- 
ite cluster such as considered here these rapid changes will 



average out and the mass can only change on a cosmologi- 
cal timescale: M ~ HM. For power law profile with 7 ~ 2 
M ~ 47tpr3 and M ~ A-rrpr 2 v, where v is the mean infall ve- 
locity, by continuity. So if M(r) — a(r)HM(r), with, by the 
above argument, a(r) of order unity, then v(r) = a(r)Hv. 

This secular flow can generate a net offset for the red- 
shifts in two ways. First, and most importantly for clusters 
at Z <C 1, along any line of sight we observe galaxies that lie 
in a cone that will be wider at the back of the cluster. At low 
redshift this means there will be more galaxies observed at 
the back than the front in an intrinsically symmetric clus- 
ter. But we also need to allow for the countervailing bias 
caused by the fact that the more distant galaxies will be 
fainter which, as we have seen above, overwhelms the effect 
of the change of volume in the relevant range of redshifts. 
These geometric and flux limit effects, whose effects on the 
foreground and background galaxies was discussed by Kim 
and Croft (2004), modulate the density per unit line of sight 
distance linearly with distance. The real flux limited galax- 
ies observed along cones behave like particles with no flux 
selection observed in cylinders with a phase space density 
P '(r,/3) = p(r,(3)(l + 2Hx(6(Z) - l)/cZ). 

We can try to use this to estimate the redshift offset as 
fdxj d 3 l3p'(r,(3)l3 x / J dx J d 3 /3p{r,f3). Performing the in- 
tegrals over velocity this is 

(P*)r ± =Jdx(l+ ^(S(Z) - 1)) P(r)^(r)/ J dx p(r) 

(24) 

or, with /3 x (r) — a(r)Hx/c and an assumed ~ 1/r 2 density 
profile, 

{&)r ± = %(S(Z) - 1)) / dx ^/ / ^ (25) 

where r = \Jr\ + x 2 . This is rather messy and, owing to 
the presence of the factor a(r), model dependent. But we 
can note the following: if we work at r± ~ lMpc//i say, the 
integral in the denominator will be ~ 1 /r± while the contri- 
bution to the integral in the numerator from — r± < x < r± 
will be ~ r±a(r±), so we get a partial contribution to 
(fi x )r ± ~ H 2 r 2 ± /c 2 Z ~ 5 x W^Zgj, which is very small; 
corresponding to a physical velocity of only 0.16 km/s. If 
we extend the range of integration beyond \x\ < r± this 
will increase, but not by a very large factor, since a(r) will 
start its decrease towards zero at the turnaround radius. Ex- 
tending the range of integration still further the average de- 
creases since a now has changed sign. Ultimately, the value 
of this integral will become large, and even more so beyond 
the ~ lOMpc/h scale of the cluster-galaxy cross-correlation 
function, but this would not be seen as a shift of the bell- 
shaped enhancement of the redshift distribution. 

The secular infall or outflow can also couple to the 
time rate of change of the phase-space distribution func- 
tion, which will also be changing on a cosmological timescale 
(both the density and the width of the velocity distribu- 
tion will, in general, be varying). This will result in a for- 
mally similar contribution to the redshift offset, but without 
the factor 1/Z, so for low-redshift clusters this will be still 
smaller. 

At larger impact parameter the effect of outflow is more 
interesting. In figure 3 we show the line-of-sight velocity dis- 
tribution for a simple, but plausible, model for cluster den- 
sity and velocity structure in the out-flow region. The key 
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Figure 4. Distribution of line of sight velocities at impact pa- 
rameter r± = lOMpc for a simple model where £ cg = (r/ro)~ 2 
with correlation length rrj = lOMpc and turnaround radius 
rt = 8.7Mpc and where the velocity is v = i?(r — rt). A lin- 
ear ramp determined from the outermost points has been sub- 
tracted. These were generated using equation 13 with parameter 
2H(S(Z) - l)/cZ = (thin line) and 0.023/Mpc (thick line). This 
is highly exaggerated. For clusters at Z = 0.2, or averaged over the 
distribution of galaxy redshifts, this parameter is ~ 0.0023. Scal- 
ing the mean velocity appropriately gives 8z ~ — 3.0km/s/c. This 
is for cold spherical outflow. In reality this will be convolved with 
a broad quasi-Gaussian distribution of random velocities from lo- 
cal substructures but the shift of the centroid will be essentially 
the mean of the distribution shown here. 

assumption here is that outside of turnaround the radial 
velocity is v ~ H(r — r t ) and is supported but the analy- 
sis of peculiar velocity profiles numerical simulations (Cec- 
carelli, et al., 2011). The amplitude of the effect has been 
exaggerated in figure 4 by a factor 10 for clarity. For the 
average of 2H(S(Z) — l)/cZ over the distribution of galaxy 
redshifts shown in figure 2 is approximately 0.0023/Mpc. 
Scaling the the mean velocity from the model appropriately 
yields an expected blue-shift driven by the out-flow of about 
8z ~ — 3.0km/s/c at impact parameter r± ~ lOMpc. Over 
the range of impact parameters explored by WHH the effect 
of infall and outflow is very small. 



7 DISCUSSION 

We have shown that, in addition to the transverse Doppler 
effect, there are additional factors that need to be taken 
into account in interpreting the measurement of the offset 
of the net blue-shift of the cluster galaxies relative to the 
central brightest cluster galaxy. These are straightforward to 
estimate from the measured line-of-sight velocities and can 
therefore be subtracted from the measured blue-shift. The 
TD effect is a little more difficult to estimate, as it depends 
to some degree on the velocity dispersion anisotropy, but 
as it is a relatively small effect for the WHH experiment 
little error is made if we assume isotropic orbits. We have 
also shown that the redshift offset for unresolved sources is 
different again; the kinematically sourced effect is a blue- 
shift that is just the opposite of the standard transverse 
Doppler term. 



We have applied these results to the WHH measure- 
ment. We have used an empirically motivated model for the 
composite cluster halo mass density profile together with the 
observed velocity dispersions to predict the net redshift off- 
set. The largest correction comes from the surface brightness 
modulation effect. This is roughly equal to the GR effect at 
small impact parameters, and, since the velocity dispersion 
is falling with radius, this flattens out the blue-shift profile. 
The result, it has to be admitted, does not seem to agree as 
well with the data as the GR prediction alone. 

The current data do not place particularly strong con- 
straints on theories that invoke long-range non-gravitational 
interactions in the dark sector. However, the observational 
situation has already improved substantially with nearly 
three times as many galaxy redshifts obtained by the Sloan 
telescope once one includes the extensions such as BOSS 
(Dawson et al, 2013), and in the near future there will be yet 
more data available, from surveys such as big-BOSS and also 
potentially from ASKAP and Aperitif in the radio (Duffy, 
et al, 2012), to strengthen this test of fundamental physics. 

ZPL suggested that, in principle, one could use the dif- 
ference in redshift offsets observed for X-ray gas, assuming 
that can be done, and measurements of galaxies as a probe 
of the anisotropy of the velocity dispersion tensor in clusters. 
The analysis here shows that the strength of the kinemati- 
cally sourced redshifts depends on the luminosity weighting 
scheme adopted, whereas, to the extent that the shape of the 
luminosity function is independent of position, this would 
not bias the measurement of the gravitational redshift. This 
provides, in principle, another way to constrain the orbital 
anisotropy. 
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APPENDIX A: PREDICTING THE REDSHIFT 
DISTRIBUTION 

We have estimated above the effects on the mean redshift. 
However, what is actually measured is not a simple centroid, 
since there are foreground and background galaxies so what 
WHH did was to fit the distribution of redshifts relative to 
the BDG to a model with a background consisting of a linear 
ramp and a cluster consisting of a double Gaussian. Given 
a theoretical model for the PSD, either analytic or obtained 
by stacking clusters found in a simulation, one would like to 
generate the predicted distribution of redshifts as a function 
of impact parameter. A convenient way to do this is to note 
that the observed redshift z expressed as a recession velocity 
is, as before, P' x = -z = p x - /3 2 /2 - P\j2 + $/c 2 , where 
we have now separated p 2 into line of sight and transverse 
components. Thus dp' x = (1 — p x )dp x ; i.e. the Jacobian of 
the transformation from velocity P, with respect to the rest- 
frame observers, to measured redshift P' is 1 — p x . Conserva- 
tion of particles requires that the observed density of parti- 
cles as a function of position, redshift and transverse velocity 
p'(r,p x ,0 ± ) satisfies p'(r, p' x , (3 ± )dp> x = p LC (r, p x , /3±)dp x 
so 

p'{r, P' X ,P±) = P rf(v, P' x +P 2 x /2 + pl/2 - <£>/c 2 , 0±) (Al) 

i.e. the density of objects in position, radial and transverse 
velocities is a mapping of the rest-frame PSD with a dis- 
placement along the p x axis. Note that P\ here denotes the 
sum of the squares of the two transverse velocity compo- 
nents. 

We can now expand the RHS as a Taylor series for small 
displacement. We also want to evaluate this at t = x/c, 
which we can also treat as a small displacement, resulting 



P '( r ,p x ,f3 ± ) = p(r,p x ,f3 ± ) 



+ {P z x /2 + PI/2-$/S) 
+ -p(r,p x ,(3 ± ) 



2 .8p(r,p x ,f3 x ) 



dp x 



(A2) 



where dot denotes partial derivative with respect to time, 
and we have dropped the prime on p x . Integrating over the 
transverse velocity components gives 



p'(r,p x ) = p(r,p x ) 



+ (p 2 x /2+{Pi)/2-<S>/c 2 ) 
+ -p{r,p x ). 



2,dp(r,p x 



dp x 



(A3) 



As a sanity check, if we ignore the last term, multiply by 
p x , and integrate over space and velocity, assuming p to be 
an even function of its arguments, we find 5z = (—Px) = 
{P'D + {Pi + /3±>/2 - <I>/c 2 in accord with equation (5). 

We could integrate this expression over line of sight dis- 
tance to get the distribution function for the observed red- 
shift as a function of the impact parameter, but that would 
not properly allow for the fact that we observe in a cone, nor 
would it incorporate the surface brightness boosting effects. 
Both of these can be allowed for simply by multiplying the 
first term on the RHS by the factors 1 + (3 + a(Z))5{Z)p x 
and 1 — 2Hx(S(Z) — l)/cZ. Linearising the result gives: 



p'(r±,p x ) = p(r ± ,p x ) + dx{ 



(/3 2 /2+(/?i)/2-$/c 2 ) 



2 .dp{r,p x 



dp x 



+ ((3 + a(Z))S(Z)p x - 2Hx(8(Z) - l)/cZ)p(r, p x ) 



+ -p(r,P x )}. 



(A4) 



This is valid for an individual cluster. If we average over 
the population of clusters and denote averaged properties as 
e.g. p = J dCP{C)p/ J dCP{C) then we have 

p(r ± ,p x )=p(r ± ,p x )+ / dx{ 



Pi &P(r,P 



<:) 



2 d p x + w m)/*~*/^^ 

+ ((3 + a(Z))5(Z)p x - 2Hx(S(Z) - l)/cZ)p(r, p x ) 



+ -P(r,A.)}. 

c 



(A5) 



With an ensemble average cluster PSDF, along with the 
average of this times (P±)/2 — $/c 2 from e.g. a cosmological 
simulation this expression, after integrating along the line- 
of-sight, provides the predicted distribution function for the 
observed redshifts which can then be analysed in precisely 
the same way at the real data (e.g. finding the shift of the 
velocity distribution by modelling) to obtain the predicted 
redshift offset as a function of impact parameter. This would 
also allow comparison of predicted and observed higher order 
moments of the velocity distribution such as skewness and 
kurtosis. 



